Deriving Cluster Analytic Distance Functions from Gaussian Mixture Models

نویسنده

  • Michael E. Tipping
چکیده

The reliable detection of clusters in datasets of non-trivial dimensionality is notoriously difficult. Clustering algorithms are generally driven by some distance function (usually Euclidean) defined over pairs of examples, which implicitly treats distances within and between clusters alike. In this paper, a more effective distance measure is proposed, derived from an a priori estimated Gaussian mixture model. Examples illustrate how the proposed approach can effectively de-emphasise within-cluster structure, and thus implicitly magnify the separation between regions of high data density.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval

In this paper we propose a new distance metric for probability density functions (PDF). The main advantage of this metric is that unlike the popular Kullback-Liebler (KL) divergence it can be computed in closed form when the PDFs are modeled as Gaussian Mixtures (GM). The application in mind for this metric is histogram based image retrieval. We experimentally show that in an image retrieval sc...

متن کامل

Discovering natural kinds of robot sensory experiences in unstructured environments

We address the symbol grounding problem for robot perception through a data-driven approach to deriving sensor categories. Unlike model-based approaches, our method learns intrinsic categories (or natural kinds) from the raw data itself. We approximate a manifold underlying sensor data using Isomap nonlinear dimension reduction and apply Bayesian clustering (Gaussian mixture models) to discover...

متن کامل

Constructing family trees of multilingual speech using Gaussian mixture models

This paper proposes a method for automatically clustering multilingual speech so as to derive language family trees. We consider that the language is the source of information which generates speech feature parameters; the probability or statistical characteristics of this information is modeled by Gaussian mixture models (GMMs); then a distance measure between the GMMs is introduced. Based on ...

متن کامل

Unsupervised Classification of Functions using Dirichlet Process Mixtures of Gaussian Processes

This technical report presents a novel algorithm for unsupervised clustering of functions. It proceeds by developing the theory of unsupervised classification in mixtures from the familiar mixture of Gaussian distributions, to the infinite mixture of Gaussian processes. At each stage a both a theoretical and an algorithmic exposition are presented. We consider unsupervised classification (or cl...

متن کامل

The Cluster-Weighted Framework and the Multilevel Models

Density estimation through function fitting which allows us to estimate densities outside measured points, however does not handle satisfactorily many kinds of situations. First, it is usually hard to express local structure with a global functional form. Another problem comes from the kind of functions that we need to represent. Consider points distributed on a low-dimensional manifold embedde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999